Towards Missing Data Imputation: A Study of Fuzzy K-means Clustering Method

نویسندگان

Dan Li

Jitender S. Deogun

William Spaulding

Bill Shuart

چکیده

In this paper, we present a missing data imputation method based on one of the most popular techniques in Knowledge Discovery in Databases (KDD), i.e. clustering technique. We combine the clustering method with soft computing, which tends to be more tolerant of imprecision and uncertainty, and apply a fuzzy clustering algorithm to deal with incomplete data. Our experiments show that the fuzzy imputation algorithm presents better performance than the basic clustering algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Missing data imputation in multivariable time series data

Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...

متن کامل

A Fuzzy Clustering Approach for Missing Value Imputation with Non-Parameter Outlier Test

Missing value is a challenging issue in data mining, as information deficiency negatively affects both data quality and reliability. This paper focuses on an algorithm of a fuzzy clustering approach for missing value imputation with noisy data immunity. The PCFKMI (Pre-Clustering based Fuzzy K-Means Imputation) method aggregates data instances to more accurate clusters for further appropriate e...

متن کامل

A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm

Missing values in datasets should be extracted from the datasets or should be estimated before they are used for classification, association rules or clustering in the preprocessing stage of data mining. In this study, we utilize a fuzzy c-means clustering hybrid approach that combines support vector regression and a genetic algorithm. In this method, the fuzzy clustering parameters, cluster si...

متن کامل

A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data

The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...

متن کامل

On a Fuzzy c-means Algorithm for Mixed Incomplete Data Using Partial Distance and Imputation

The focus of fuzzy c-means clustering method is normally used on numerical data. However, most data existing in databases are both categorical and numerical. To date, clustering methods have been developed to analyze only complete data. Although we sometimes encounter data sets that contain one or more missing feature values (incomplete data), traditional clustering methods cannot be used for s...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Towards Missing Data Imputation: A Study of Fuzzy K-means Clustering Method

نویسندگان

چکیده

منابع مشابه

Missing data imputation in multivariable time series data

A Fuzzy Clustering Approach for Missing Value Imputation with Non-Parameter Outlier Test

A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm

A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data

On a Fuzzy c-means Algorithm for Mixed Incomplete Data Using Partial Distance and Imputation

عنوان ژورنال:

اشتراک گذاری